SE 


MICROCOPY  RESOLUTION  TEST  CHART 

NATIONAL  BUREAU  OF  STANDARDS- 1963- A 


•mmumcaiio 
Control  Syst< 


' ,  ■' 


** 


i.VX  fy-i 
T 


.-.kv  ■  . 

'■  i/5  '^V  ' 


v  s.'  ,’i;  r 


electrical  -and  Goi 


•-  .  ■? 

‘  •I  ?n 


BAYES  SMOOTHING  ALGORITHMS  FOR  SEGMENTATION 
OF  IMAGES  MODELLED  BY  MARKOV  RANDOM  FIELDS 


By 

Haluk  Derin^,  Howard  Elliott^, 

Roberto  Cristi  and  Donald  Geman' 

Department  of  Electrical  and  Computer  Engineering 
"^Department  of  Mathematics  and  Statistics 
University  of  Massachusetts 
Amherst,  Massachusetts  01003 

Technical  Report  #UMASS-ECE-Al$3-1 
August  1983  Qr 


SECURITY  CLASSIFICATION  or  Tmi*  PAGE  |m«  Omm  bMn« 


REPORT  DOCUMENTATION  PAGE 


|M  :  T 


DMASS-ECE-Aug83-1 


*.  T1TU*  faM  JMMIfteJ 


Bay*#  Smoothing  Algorithms  for  Segmentation  of 
Image#  Modelled  by  Markov  Random  fields 


»t*rOMMNO  ORGANIZATION  NAME  ANO  ADDRESS 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


S.  RECIPIENT'S  CATALOG  NUMBER 


*•  TYPE  OF  REPORT  A  PERIOD  COVERED 


Final 


«.  PERFORMING  ORO.  REPORT  NUMBER 


ontractor  grant  number^ 


N00014-83-K-0059 


10.  PROGRAM  ELEMENT,  project,  TASK 
AREA  A  WORK  UNIT  NUMBERS 


H.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 


U.  REPORT  DATE 

August  1983 


NAME  •  AOORESSflf  dtttoro* I  I 


i  Commit*!  OMm)  IS.  SECURITY  CLASS,  (mt  Otto  rmport) 

Unclassified 


OCC^MSIPICATiON/ OOWNGRAOING 


•0.  DISTRIBUTION  STATEMENT  (*i  Ml*  Kopmt) 


APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED. 


17.  DISTRIBUTION  STATEMENT  (ml  <*•  — «irt  awN  la  SI  Hi  20.  It  Altera*  from  Rapartj 


If.  REV  BOROS  f 


i  mm  if  arimay 


Itenliy  *r  Maa*  i 


Markov  random  fields,  Bayes  smoothing, 
Image  segmentation 


20.  ami  PACT  f Cmmm  «•  itvwM  II  auMiwp  an  ftfarttffp  mm*  n«Nr) 

new  image  segmentation  algorithm  is  presented,  based  on  recursive  Bayes 
smoothing  of  Images  modelled  by  Markov  random  fields  and  corrupted  by 
independent  additive  noise.  The  Bayes  smoothing  algorithm  yields  the  a 
posteriori  distribution  of  ths  scene  value  at  each  pixel,  given  the  total  noli 
image,  in  e  recursive  way.  The  a  posteriori  distribution  together  with  a 
criterion  of  optimality  then  determine  a  Bayes  estimate  of  the  scene.  Exampli 
are  given  where  the  algorithm  is  applied  to  test  imagery  and  also  SEASAT  SAT. 


EDITION  or  I  ROW  «*  IB  OBSOLETE 

S.  N  0 102*  LF-OU-  4601 


unclassified 

SI CURlTv  CLASSIFICATION  OF  THIS  PAGE  'Who*  Owa  NwM) 


ABSTRACT 


A  new  image  segmentation  algorithm  is  presented,  based  on  recursive 
Bayes  smoothing  of  images  modelled  by  Markov  random  fields  and  corrupted 
by  independent  additive  noise.  The  Bayes  smoothing  algorithm  yields  the 
a  posteriori  distribution  of  the  scene  value  at  each  pixel,  given  the 
total  noisy  image,  in  a  recursive  way.  The  a  posteriori  distribution 

V  . 

together  with  a  criterion  of  optimality  then  determine  a  Bayes  estimate 
of  the  scene. 

The  algorithm  presented  is  an  extension  of  a  1-D  Bayes  smoothing 
algorithm  to  2-D  and  it  gives  the  optimum  Bayes  estimate  for  the  scene 
value  at  each  pixel.  Computational  concerns  in  2-D,  however,  necessitate 
certain  simplifying  assumptions  on  the  model  and  approximations  on  the 
implementation  of  the  algorithm.  In  particular,  the  scene  (noiseless  image) 
is  modelled  as  a  Markov  mesh  random  field,  a  special  class  of  Markov  random 
fields,  and  the  Bayes  smoothing  algorithm  is  applied  on  overlapping  strips 
(horizontal/vertical)  of  the  image  consisting  of  several  rows  (columns). 

It  is  assumed  that  the  signal  (scene  values)  vector  sequence  along  the  strip 
is  a  vector  Markov  chain.  Since  signal  correlation  in  one  of  the  dimensions 
is  not  fully  used  along  the  edges  of  the  strip,  estimates  are  generated 
only  along  the  middle  sections  of  the  strips.  The  overlapping  strips  are 
chosen  such  that  the  union  of  the  middle  sections  of  the  strips  gives  the 
whole  image. 

Different  versions  of  the  Bayes  smoothing  algorithm  based  on  different 
assumptions  on  the  Markov  random  field  are  implemented  and  applied  to  seg¬ 
mentation  of  some  two  level  test  images.  The  results  are  very  good  even 
for  very  low  signal-to-noise  ratios  and  compare  favorably  with  existing  seg¬ 
mentation  algorithms.  The  algorithms  are  also  applied  to  remotely  sensed 


i  i 

SAR  data  obtained  from  SEASAT,  yielding  remarkably  good  segmentation  of 
these  images  as  well. 


I.  INTRODUCTION 


Image  segmentation  and  restoration  have  received  enormous  attention  in 
the  image  processing  literature.  In  this  report,  we  present  an  image  segmen¬ 
tation  algorithm  based  on  Markov  random  field  (MRF)  models  for  the  image  and 
a  Bayesian  smoothing  approach  for  the  actual  processing.  The  image  is 
assumed  to  be  the  sum  of  the  realizations  of  two  independent  stochastic  pro¬ 
cesses:  the  "scene",  a  MRF,  and  the  noise  field,  consisting  of  iid  (indepen¬ 
dent  identically  distributed)  Gaussian  random  variables  (r.v.) .  The  Markov 
scene  model  provides  a  powerful  mechanism  for  incorporating  the  spatial 
dependence  of  pixels  in  relative  proximity  of  each  other. 

The  Bayes  smoothing  approach  used  in  the  algorithm  is  an  extension  to 
2-D  of  the  1-D  algorithm  presented  by  Askar  and  Derin  [1],  The  1-D  Bayes 
smoothing  algorithm  is  based  on  a  model  where  the  signal  is  a  Markov  process 
corrupted  by  independent  noise.  The  natural  extension  to  2-D  is  to  model 
the  scene  as  a  MRF,  although  extension  to  2-D  results  in  severe  computational 
complexity.  It  is  well  known  that  in  1-D  the  Bayes  smoothing  estimate  per¬ 
forms  better  (i.e.,  has  a  smaller  mean-squared  error)  than  the  linear  smoothing 
or  filtering  estimates.  In  particular,  it  minimizes  the  expected  cost  of  the 
estimate,  using  all  available  data  and  without  any  restriction  of  linearity. 

For  a  quadratic  cost  function  the  Bayes  estimate  is  the  a  posteriori  mean 
and  for  uniform  cost  function  the  Bayes  estimate  is  the  maximum  a  posteriori 
(MAP)  estimate. 

Some  of  the  earlier  work  on  image  segmentation  and  estimation  in  2-D  is 
due  to  Nahi  and  Assefi  [2],  Nahi  [3],  Habibi  [A],  Nahi  and  Franco  [5],  Woods 
and  Radewan  [6].  Most  of  this  work  involves  attempts  to  extend  Kalman 
filtering  ideas  to  2-D  by  devising  various  2-D  scanning  schemes  and  defining 
appropriate  autoregressive  models.  Although  Bayesian  estimation  is  mentioned 
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in  some  of  these  studies,  it  is  basically  a  linear  filtering  or  linear 
smoothing  that  is  being  sought.  To  reduce  the  prohibitive  computational 
load  in  2-D  certain  reasonable  suboptimality  assumptions  have  been  pro¬ 
posed  by  Woods  and  Radewan  [6],  namely,  Kalman  strip  processor  and  reduced 
update  Kalman  filtering.  Despite  the  Markovian  characteristics  of  some  of 
these  models,  the  connection  with  MRF  is  incidental  and  undeveloped. 

There  is  a  vast  body  of  literature  on  various  spatial  interaction 
models  and  image  restoration  and  parameter  estimation  algorithms  based  on 
these  models,  e.g.,  [7]-[10].  Woods  in  [7]  presents  an  autoregressive 
model  characterization  of  causal,  unilateral  Markov  random  field  models. 
Chellappa  and  Kashyap  [8],  [9]  present  an  extensive  exposition  of  simul¬ 
taneous  auto-regressive  (SAR)  and  conditional  Markov  (CM)  spatial  inter¬ 
action  models,  and  some  parameter  estimation  and  neighborhood  determina¬ 
tion  schemes  for  these  models.  They  also  report  on  linear  image  restora¬ 
tion  algorithms  for  the  SAR  and  CM  models.  For  a  detailed  treatment  on 
these  and  other  models,  we  refer  the  reader  to  these  references  and  to 
the  paper  by  Jain  [10],  on  image  processing  models,  and  to  references 
therein. 

MRF  models,  Bayesian  approach  and  MAP  formulations  in  the  context  of 
image  segmentation  have  been  used  before  by  other  researchers.  MRF  models 
and  MAP  formulation  is  combined  in  the  recent  work  by  Kaufman  et  al  [11] 
with  reduced  update  Kalman  filtering  techniques  and  in  Therrien  [12],  [13] 
with  two  dimensional  autoregressive  texture  models  for  texture  based 
segmentation.  Elliott  et  al  [14],  Elliott  and  Srinivasan  [15],  and 
Cooper  et  al  [16]  have  applied  MAP  techniques  to  boundary  estimation  in 
noisy  Images.  Hansen  and  Elliott  [171  use  MRF  models,  MAP  formulation  and 
some  simplifying  assumptions  on  the  model  to  devise  image  segmentation 


algorithms  based  on  two  stages  of  dynamic  programming.  In  the  recent  work 
by  Geman  and  Geman  [18]  and  Elliott  et  al  [19],  the  Gibbs  distribution 
characterization  of  MRF  is  used  to  develop  image  segmentation  algorithms. 

The  image  segmentation  algorithm  in  this  report  is  based  on  a  special 
conditional  characterization  of  MRF's.  The  objective  of  the  algorithm  is 
to  determine  recursively  the  a  posteriori  distribution  and  subsequently 
an  estimate  of  the  scene  value  (pixel  intensity)  at  each  pixel  given  the 
whole  (noise  corrupted)  image.  The  algorithm  yields  the  optimal  estimate 
of  each  pixel  value  in  the  scene  given  the  observed  image  under  quite 
general  assumptions,  namely,  a  MRF  model  for  the  scene,  and  independent 
noise.  However,  computational  complexity  of  the  algorithm  in  2-D,  necessi¬ 
tates  restrictions  on  the  class  of  MRF  models,  and  suboptimal  implementation 
of  the  algorithm.  The  model  for  the  scene  is  restricted  to  the  special 
class  of  MRF's  called  Markov  mesh  random  fields  (MMRF)  which  are  character¬ 
ized  by  causal  transition  distributions.  Furthermore  the  processing  is 
done  over  relatively  narrow  strips  rather  than  over  the  full  image  as  pro¬ 
posed  by  Woods  and  Radewan  [6]  in  a  Kalman  filtering  set  up.  Estimates 
are  obtained  at  middle  sections  of  the  strips  and  these  pieced  together  from 
overlapping  strips  yield  an  estimate  of  the  full  scene.  Various  versions 
of  the  algorithm  are  applied  to  some  binary  test  images  with  varying  noise 
levels  and  to  some  SAR  images  obtained  from  SEASAT.  The  segmentation  results 
even  for  low  signal-to-noise  ratio  images  are  remarkably  good. 

There  is  an  important  distinction  between  the  a  posteriori  distribution 
of  the  scene  value  at  each  pixel  based  on  the  full  image,  which  is  sought 
here,  and  the  scene  configuration  that  will  maximize  the  a  posteriori  dis¬ 
tribution  of  the  full  scene  given  the  full  image,  which  is  the  goal  of  some 
earlier  work.  Although  the  latter  may  seem  more  desirable,  the  full  scene 
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estimate  is  more  constrained,  and  more  susceptible  to  burst  type  errors. 

We  have  performed  extensive  simulations  in  1-D  comparing  the  performances 
of  these  approaches  and  observed  that,  in  most  cases,  the  individual  pixel 
estimates  were  just  as  good  as  those  of  the  full  scene.  Furthermore,  the 
former  approach  provides  the  additional  flexibility  to  compute  other  Bayes 
estimates  corresponding  to  other  cost  functions,  if  desired. 

This  report  is  organized  as  follows.  In  Section  II,  we  give  a  brief 
exposition  of  the  Bayes  smoothing  algorithm  in  1-D.  Then  we  present  in 
Section  III  the  problem  statement  and  the  MRF  model.  Extension  of  the 
Bayes  smoothing  algorithm  to  2-D  and  its  implementation  for  Markov  mesh 
random  fields  is  presented  in  Section  IV.  In  Section  V  the  algorithm  is 
applied  to  various  test  images  and  SAR  data,  followed  by  concluding  remarks 
in  Section  VI. 

II.  BAYES  SMOOTHING  ALGORITHM  IN  I-D 

Let  {X^}  denote  the  signal  to  be  estimated.  It  is  assumed  that  { 
is  a  Markov  sequence  with  the  known  transition  probability  density  function 
f(x^|x^_^)  and  initial  distribution  f(x^).  Note  that  throughout  this  report 
random  variables  are  denoted  by  capital  letters  and  their  realizations  by 
respective  lower  case  letters.  The  observation  at  each  step  is  given  as 

Yk  "  gk(Xk’  V  k  =  1,2, ... ,N  (1) 

where  is  the  observation  noise.  It  is  also  assumed  that  {W^}  is  an 
independent  sequence,  {X^}  and  {W^}  are  mutually  independent  and  that  is 
an  arbitrary  measurable  function. 

N 

Based  on  realizations  from  the  observation  sequence  y  =  { y, ,y0, . . . ,yv"  , 

"12  N 

it  is  desired  to  compute  an  estimate  of  the  signal  for  each  k.  A  Bayes 
estimate  -  J^(YN) ,  where  YN  -  {Y^Yj, ...  ,YH>,  is  one  that  minimizes  the 


expected  value  of  a  cost  function  CCX^jX^)  associated  with  the  estimation 


procedure,  namely 


EtCtX^Xj^}  -  |  C^.XjP  f(xk,yN)  dxfc  dyN 


Equivalently,  we  seek  to  minimize 


E{C(Xk,X|t)|YH  *  yN>  -  j  C(xk,^k)  f(xk|yN)  dxfe  (3) 

for  each  y*1.  So  it  is  necessary  to  determine  the  a  posteriori  distribution 

f(xklyN)* 

■  n 

Theoretically,  the  a  posteriori  distribution  f(x,  |y  )  is  determined  by 

N 

the  joint  statistics  of  the  random  sequence  X  =  (X1,X2> . . . , X^} ,  the  statis¬ 
tics  of  the  noise  sequence  and  the  functions  g^.  However,  in  order  for 
this  to  be  of  any  practical  value,  the  a  posteriori  distribution  f(x.  |y  ) 
must  be  determined  recursively  for  1  ±  k  £  N.  The  following  are  some  results 
along  these  lines  due  to  Askar  and  Derin  [1], 

Theorem  1.  Under  the  assumptions  stated  above  in  (1) 

f^XklXj’yN^  “  k  <  j  _<  N.  (4) 

where  y  *  (y^,y2* • • • fy^J  is  a  realization  of  the  random  sequence 
Yk  «  {Y1,Y2,...,Yk}. 

Theorem  2.  Under  the  assumptions  stated  above  in  (1) 

f(xJyN) '  £<*klyk>  I  t(Vi|y")  d\+i  <5> 

where  recursive  relationships  for  the  conditional  filtered  and  predicted 
densities  f(XjJyk)  and  respectively  were  determined  by  Ho  and  Lee 

[20]  as 
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i  k  f(yk^xk)  f(xklyk~1) 

f(xk|y  )  =  1 - ; - -  <6> 

Jf(yklxk)  f(xJy  >  d\ 

and 

f(xk+Jyk)  *  \  f(xk+llxk)  f(xklyk)  dxk  (7) 

The  proof  of  Theorem  1  and  proofs  of  the  recursive  relationships 
(5)  -  (7)  follow  from  Bayes'  rule  and  model  assumptions.  The  smoothing 
algorithm  given  in  (5)  is  a  backward  algorithm  where  k  is  running  from 
N-l  to  1.  So  the  smoothing  algorithm  involves  two  passes  over  the  data, 
one  forward  pass  during  which  the  conditional  filtered  and  predicted  den¬ 
sities  are  computed  according  to  (6)  and  (7)  and  stored,  and  one  backward 

pass  during  which  the  smoothing  a  posteriori  densities  are  computed.  Note 

N 

that  duriri  the  backward  pass  the  observation  sequence  y  is  not  explicitly 
used. 

The  general  implementation  of  this  smoothing  algorithm  could  lead  to 
storage  and  processing  difficulties.  However,  under  Gaussian  assumptions 
the  densities  are  characterized  by  a  few  parameters,  and  computation  and 
storage  is  straightforward.  Also,  if  the  signal  {X^}  is  a  sequence  of  r.v.'s 
taking  only  finitely  many  values,  then  the  integrals  reduce  to  finite  summa¬ 
tions  and  the  implementation  of  the  algorithm  is  relatively  straightforward. 
In  this  smoothing  algorithm,  attention  is  concentrated  on  computing 

f(x,  |y  )  the  a  posteriori  distribution  of  individual  signal  values,  given 

N 

the  full  observation  set  y  .  An  alternative  Bayes  estimation  scheme  would 
N|  n  N  N 

be  to  compute  f(x  |y  )  (or  equivalently  f(x  ,y  ))  yielding  a  Bayes  estimate 

of  the  whole  signal  sequence.  Recursive  expressions  for  these  densities 

can  easily  be  written  down  for  models  at  hand.  However,  the  actual  implemen- 

N 

tation  requires  computations  and  memory  in  the  order  of  K  where  K  is  the 
number  of  values  r.v.  takes  and  hence  it  is  not  feasible  to  compute 
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f(x^)y^)  unless  for  very  short  sequences,  even  for  binary  r.v.'s.  On  the 


N  N  i  N 

other  hand,  the  x  that  will  maximize  f(x  [y  )  can  be  computed  using  dynamic 


programming,  thus  obtaining  a  MAP  estimate  of  X  .  Such  a  MAP  formulation 
is  presented  by  Scharf  and  Elliott  [21]  and  used  in  image  segmentation  by 
Hansen  and  Elliott  [17]. 

An  extensive  simulation  in  1-D  was  carried  out  and  the  performance  of 

the  two  approaches  on  different  binary  signal  sequences  were  compared.  It 

i  N 

was  observed  that,  taking  the  mode  of  f(x.  |y  )  as  the  estimate  for  X^  for 
k  =  1,2,.,.,N  and  then  piecing  them  together  to  get  the  estimate  for  the 
full  scene  X^  performs  just  as  well,  if  not  better,  in  most  cases  as  the 
MAP  estimate  obtained  by  maximizing  f(x  |y  ).  Of  course,  both  estimates 
performed  much  better  than  a  simple  threshold  comparison  estimate  which  does 
not  exploit  the  dependence  in  the  signal  sequence. 


III.  PROBLEM  STATEMENT  AND  MARKOV  RANDOM  FIELD  MODELS 


Problem  Statement 


Suppose  the  image  is  a  random  field  Y  =  {Y^}  defined  over  a  finite 


x  N2  lattice  of  points  (pixels)  defined  as  L  ■  {(i,j):  1  <_  i  <_  N^, 

1  1  j  ^  N2}.  Suppose  further  that  the  image  random  field  Y  is  a  function 
of  a  scene  (true  picture)  random  field  X  and  a  corruptive  noise  random  field 
W,  each  defined  over  the  same  N^  x  N2  lattice.  The  functional  relationship 
between  the  random  fields  is  such  that  at  each  pixel  the  image  r.v.  is  a 
function  of  the  scene  r.v.  and  the  noise  r.v.  at  that  pixel,  that  is 


Yij  "  8ij(Xij,Wij) 


(k, j)  e  L 


(8) 


Although,  with  assumptions  that  will  be  stated  below,  the  model  allows 


for  an  arbitrary  measurable  function,  gjj  (♦,*),  e.g.,  g^  that  can  model  satur- 
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ation,  clipping  or  non-linearity  effects,  the  interest  is  mainly  in  the 
case  where  the  image  is  the  scene  plus  noise.  In  other  words. 


Yij  =  XiJ  +  Wij 


(i> j)  e  L 


or  in  matrix  form 


Y  =  X  +  W 


It  is  assumed  that  X  is  a  Markov  random  field  (MRF) ,  that  W  consists 
of  independent  r.v.'s  at  each  pixel,  and  that  W  and  X  are  mutually  inde¬ 
pendent.  The  common  distribution  of  the  W^.  and  the  joint  distribution  of 
X  is  unspecified. 

If  X^  represents  the  brightness  level  of  the  scene  at  pixel  (i,j),  it 
is  reasonable  to  assume  that  X^  will  depend  mostly  on  the  values  of  neigh¬ 
boring  pixels  and  that  the  correlation  between  the  values  of  two  pixels 
decreases  with  distance.  This  spatial  continuity  is  inherent  in  the  MRF 
model,  in  which  the  neighborhood  structure  can  be  specified  according  to 
the  degree  of  spatial  continuity  desired. 

Our  objective  is  to  determine  f(x^jy),  the  a  posteriori  distribution 
of  the  scene  at  pixel  (i,j)  given  the  full  image  y,  for  each  pixel  in  the 
lattice.  From  these,  an  estimate  of  each  X^  and  finally  combining  them 
an  estimate  for  the  full  scene  X  is  obtained.  In  this  study  the  estimate 
of  X^j  is  taken  as  the  mode  of  the  a  posteriori  distribution  f(x^|y),  which 
is  a  reasonable  choice  especially  for  binary  scenes  the  algorithm  is  applied 


For  the  model  under  consideration  the  a  posteriori  distribution  f(x^[v) 
can  be  determined  recursively,  as  will  be  explained  in  Section  IV.  However, 


due  to  the  amount  of  computation,  certain  approximations  are  made,  such  as 


processing  the  image  in  sections  (strips).  These  and  other  considerations, 
which  are  discussed  in  Section  IV,  lead  naturally  to  certain  subclasses  of 
MRF's  called  Harkov  mesh  random  fields  (MMRF) . 

Markov  Random  Fields  (MRF) 

The  foundations  of  MRF  lies  in  the  physics  literature  on  ferromagnetism, 
originating  in  the  work  of  Ising  [22]  in  1925.  We  will  give  only  the  basic 
definitions;  for  a  detailed  treatment  we  refer  the  reader  to  Kinderman  and 
Snell  [23]. 

Definition  3.1.  Let  L  be  a  finite  lattice 

L  =  { (i,j)  :  1  <  i  <  Nr  1  i  j  1  N2J  (11) 

Neighborhood  of  a  e  L.  denoted  by  H  ,  is  defined  as 

n  *  (b  e  L:  b  i*  a  and  a  e  n.  }  Va  e  L  (12) 

a  b 

And,  a  neighborhood  system  on  L  is  defined  as 

H  ■  (n  ;  a  e  L  (13) 

a 

Definition  3.2.  Given  a  finite  lattice  L  and  a  neighborhood  system  H  on 
L,  a  random  field  X  ■  {X&,  a  e  l)  is  a  Markov  random  field  if  and  only  if 

p<xa  ’  *X  -  V  b  C  L  "  U,)  ’  P(Xa  -  xa|Xb  ‘  V  b  '  V  (14) 


for  each  a  e  L. 

Elaborate  spatial  dependence  can  be  modelled  by  suitably  large  neighbor¬ 
hood  systems.  Indeed,  any  random  field  L  is  a  MRF  with  respect  to  the  neigh¬ 
borhood  system  1  for  which  ■  L,  V(i,j)  e  L.  However,  relatively  simple 
neighborhood  systems  are  adequate  in  modelling  most  scenes  of  interest, 
and  we  are  mainly  interested  in  the  two  simplest  neighborhood  systems: 


(i)  the  4  neighbor  system  f~,  and  (ii)  the  8  neighbor  system  1  ,  both 
depicted  in  Figure  1.  Certain  obvious  adjustments  need  to  be  made  for  the 
neighborhoods  of  the  pixels  along  the  boundaries. 

The  probabilities  on  the  left  hand  side  of  (14)  are  called  the  "local 
characteristics"  of  the  field,  and  it  can  be  shown  that,  for  any  random  field, 
these  local  characteristics  uniquely  determine  the  joint  distribution  P(X). 
Now  every  MRF  (with  P(X  =  x)  >  0  for  all  x)  is  also  a  so-called  "Gibbs  distri¬ 
bution"  (and  vice  versa) ,  and  this  equivalence  allows  us  to  write  down 
explicitly  the  joint  distribution  P(X  =  x) . 

It  also  follows  from  the  Gibbs  distribution  equivalence  that  the  joint 

1  2 

distribution  of  a  MRF  with  f  or  IT  neighborhood  system  can  be  expressed  as 

P(X  "  X)  *  (iJ)£L  pij(xij‘xi-l,j*  xi,J-l»xi-ltj-l)  (15) 

with  necessary  adjustments  for  i  »  0  or  j  =  0  or  both.  The  expression  in 
(15)  for  P(X  =  x)  can  be  obtained  as  an  extension  of  the  Hammersley  and 
Clifford  Theorem  (see  Besag  [24]).  Note  that  p^'s  in  (15)  are  not  unique 
and  calculation  of  a  set  of  them  may  be  extremely  difficult  due  to  the 
necessary  normalization.  Fortunately  computation  of  p^'s  is  not  necessary. 
The  important  point  is  that  such  a  representation  for  P(X  =  x)  exists. 

The  following  are  implied  by  the  representation  in  (15)  and  will  be 
used  in  the  smoothing  algorithm. 

Theorem  3.1.  Let  X  =  (X  ,  aeL)  be  a  MRF  on  lattice  L  with  neighborhood 

~  d 

1  2 

system  H  or  If  .  The  columns  (and  rows)  of  this  MRF  constitute  a  vector 
Markov  chain. 


This  is  an  interesting  and  potentially  useful  result  concerning  finite 
1  2 

lattice  MRF's  with  II  or  f  neighborhood  systems. 


Theorem  3.2.  For  {X  ,  aeL}  a  MRF  on  lattice  L  with  neighborhood  system 

cl 

-1  «2 
H  or  U 


P(xa,aeA| x^,beB)  =  P(xa,acA|x^,b£C) 
where  A,  B,  and  C  are  subsets  of  L  shown  in  Figure  2. 


(i,1) 

n  -  4  neighbor  system 


ri  -  8  neighbor  system 


1  2 

Figure  1.  n  and  n  neighborhood  systems 


Figure  2.  Subsets  of  L  that  are  used  in  Theorem  3.2 
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Markov  Mesh  Random  Field  (MMRF) 

Neither  the  conditional  distribution  nor  the  joint  distribution  charac¬ 
terizations  of  MRF's  are  immediately  applicable  in  the  Bayes  smoothing  algor¬ 
ithm  that  is  being  proposed.  What  is  needed  is  a  characterization  more 
directly  causal  than  (15) ,  which  brings  us  to  the  so-called  Markov  mesh 
random  fields  (MMRF)  introduced  by  Abend  et  al  [25]  and  Kanal  [26].  Similar 
causal,  unilateral  characterizations  have  also  been  used  by  Kashyap  and 
Chellappa  [8],  [9]. 

Let  L  be  the  lattice  described  above.  We  define 


Ai  j  *  <  i  or  £  <  j}  for  (i,j)eL 

(17) 

B1>  =  {(k,£)eL|k  <_  i  and  £  <  j  ,  (k,£)j*(i, j) }  for  (i,j)eL 

A  MMRF  is  defined  by  the  property 

p(xij lxa*  aeAtj)  ■  P(xijlxa»aeCi  j)»  (18) 

where  C.  .is  a  subset  of  B.  .  and  determines  the  particular  type  of  MMRF. 
i»j  i* J 

Such  a  random  field  has  the  following,  very  useful  property: 

P(X  *  x)  »  n  P(x,  4  j x_ »  aeC.  4)  (19) 


(i, j)eL 


ij'  a*  i,j- 


This  is  the  causal  relationship  we  need  for  the  smoothing  algorithm. 

Corresponding  to  any  choice  of  {C  there  is  a  family  {D.  ,}  such  that  th. 

1  *  J  i  >  J 

MMRF  satisfies 


P(Xj,j  -  xijlXa  "  xa*  aeL  “  td.J))) 


P(Xij  "  xi j ' Xa  "  V  acDi,j) 


Moreover,  the  (D.  .}  constitute  a  neighborhood  system,  and  hence  it  follows 

J 

from  (20)  that  a  MMRF  is  a  MRF.  The  {D,  .}  corresponding  to  a  particular 

*■  *  J 


Vi 


choice  of  (C.  .}  can  be  determined  using  (19). 

Of  special  interest  to  us  is  the  case  in  which  C  ,  .  *  {(i, j-1), (i-1, j-1; , 

J 

2 

(i-l,j)}.  The  corresponding  {D.  . }  is  the  8-neighbor  system  H  .  For  this 

i,  j 

choice  of  C.  ,,  the  joint  distribution  in  (19)  is  equivalent  to  (15),  a  joint 
i»  J 

1  2 

distribution  expression  for  a  MRF  with  IF  or  IF  neighborhood  system.  The 

subsets  of  the  lattice  relevant  to  a  MMRF  and  the  particular  C  .  are  shown 

1 » 1 

in  Figure  3.  Some  of  the  interesting  properties  of  this  particular  MMRF  are 
given  below: 

Theorem  3.3.  For  a  MMRF  with  C,  ,  =  { (i, j-1) , (i-1, j-1) , (i-1, j) }  the  following 

*  *  J 

are  true: 

(i)  P(Xa  -  xa,  -  V  bcPt,J> 

*  p(*a  *  V  aeEi,jlXb  '  V  beGiJ> 

’  "  P«,  *  *.l*b  "  V  bECa> 

“Ei,J 

(ii)  P(Xa  -  xa,  aeEj  jlXk  -  x,,,  beH^) 

*  p«a  ■  V  ’  V  krtiU> 


'4-1*4  4.1  .*4  1>  n  P(xalxb»  bECa) 

ij  I,j  i  i,j  I  1,1  -U,j}  b  a 

J 


The  sets  E.  , ,  F.  , ,  G.  , ,  H.  .  and  K.  .  are  shown  in  Figure  4. 

!>]  1>J 

The  utility  of  the  MMRF  model  which  stems  from  the  causal  relationships 
above  (especially  of  course  (19)),  are  already  discussed  in  regard  to  strip 
processing.  In  addition,  these  models  may  be  readily  simulated  and  constituu 
a  surprisingly  rich  subclass  of  MRF’s. 

Pickard  Random  Field 


There  is  yet  another  interesting  subclass  of  MRF’s  first  noted  by 
Pickard  [27]  as  a  "curious  binary  lattice  process".  Let  us  call  it  a 
Pickard  random  field  (PRF).  It  is  fully  characterized  by  the  joint  distribution 


Figure  3.  Subsets  of  L  describing  a  MMRF 


Figure  4.  Subsets  of  L  that  are  used  in  Theorem  3.3 


(23) 


of  four  r.v. 's  on  a  2  x  2  block,  (*a  ^b) ,  with  the  property  that 

c  d 

P(Xb  ■  *b|xa  ‘  Vxc  ■  *c>  -  P(Xb  ■  *JXa  ‘  *«> 

For  the  binary  case,  the  joint  distribution  of  the  block  is  then  completely 
specified  by  three  parameters:  P(Xfl  =  1),  P(X^  =  1 | =  1)  and 
P(Xd  *  l|Xc  "  1»  -  1,  Xfa  ■  1) ,  subject  to  certain  constraints. 

A  PRF  is  then  defined  constructively.  From  the  joint  distribution  of 
the  2x2  block,  initial  and  transition  distributions  of  a  2-dimensional 
vector  Markov  chain  are  generated,  giving  rise  to  a  2  x  ^  random  field, 
namely  this  chain  considered  over  the  index  set  [1,2, . . . ,N^] .  Finally 
initial  and  transition  distributions  of  an  -  dimensional  vector  Markov 
chain  are  generated,  thus  providing  in  the  usual  way  the  joint  distribution 
of  an  x  Nj  random  field.  The  random  field  so  generated  can  easily  be 
made  to  be  stationary  and  isotropic. 

Here  are  some  of  the  relevant  properties  of  a  PRF.  For  a  detailed 
treatment.  Including  some  of  the  proofs  see  Pickard  [27]. 

Theorem  3.4.,  The  PRF  described  above  has  the  following  properties: 

(i)  The  rows  (columns)  in  each  set  of  k  consecutive  columns  (rows)  of 
the  random  field  form  a  k-dlmensional  vector  Markov  chain. 

(II)  P(X,  -  MEi.jlXh  *  V 

'  P(Xa  '  V  aEEl,jlXb  ‘  V  bc  Ql,j>  <24> 

(III)  PCX,  -  xa,  ase^lx,,  -  v  bcHt (J) 

-p<x„-V  aEEl,A  *xb'beQl,J)  (25) 

Sets  E.  .,  M.  Q,  .,  and  H,  ,  are  shown  in  Figure  5. 

J-pj  ipj  **j 

Theorem  3.5.  Any  PRF  is  an  MURF,  and  hence  an  MRF. 


The  proof  of  this  interesting  result  is  lengthy  but  straightforward  and 
hence  will  not  be  Included  here. 

Actually,  the  PRF  is  too  special  to  model  most  of  the  scenes  of 
interest  to  us,  even  after  an  extension  past  the  binary  case.  We  have 
included  this  brief  discussion  in  order  to  illustrate,  in  an  especially 
concrete  way,  various  properties  of  PRF's  that  are  relevant  for  the 
smoothing  algorithm  described  in  the  next  section. 

IV.  BAYES  SMOOTHING  ALGORITHM  IN  2-D 


Recall  that  the  objective  of  the  algorithm  is  to  determine  the 


a  posteriori  distribution  f(x^|y)  recursively,  where  the  image  Y  consists 


of  a  MRF  X  on  the  N^  x  N^  lattice  corrupted  by  additive  independent  noise, 
see  (8)  -  (10)  in  Section  III. 

The  Bayes  smoothing  algorithm  in  1-D  (described  in  Section  II)  remains 
valid  if  all  quantities  involved,  namely  X^  Wfc  and  Yfe,  are  random  vectors, 
so  that  {Xk}  is  a  vector  Markov  chain  and  {W^}  is  a  sequence  of  iid  random 


vectors  and  independent  of  {X  },  The  components  of  the  random  vectors  {W  } 


need  not  be  independent,  but  the  computation  of  f(y^|x^)  is  drastically 
simplified  when  they  are.  Thus,  straightforward  application  of  the  1-D  Bayes 
smoothing  algorithm  on  the  vector  processes  recursively,  yields  f(XjJy), 
the  a  posteriori  distribution  of  the  random  vector  X^  (possibly  a  column  of 
the  scene)  given  the  observation  random  field  Y  (possibly  the  noise  corrupted 
image).  The  a  posteriori  distribution  f [ y )  of  the  r.v.  X^,  the  ith  entry 
of  the  kth  vector,  can  be  simply  obtained  from  f(x^jy)  by  integrating  over 
the  other  components  of  X^. 

Thus,  we  have  a  recursive  scheme  to  determine  the  a  posteriori  distri¬ 
bution  of  the  individual  pixel  values  given  the  noisy  image,  under  the  addi- 
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tlonal  assumption  that  columns  of  the  scene  constitute  a  vector  Markov 

chain.  VJe  have  pointed  out  in  Section  III  that  the  columns  (or  rows)  of 
1  2 

a  MRF  with  1  or  1  neighborhood  system  do  constitute  a  vector  Markov  chain 
and  thus  for  such  MRF's  the  a  posteriori  distribution  of  individual  pixel 
values  given  the  full  image  can  be  obtained  exactly.  Specifying  a  cost 
function  and  determining  the  corresponding  estimate  for  each  pixel,  this 
is  the  "best"  that  can  be  done  in  estimating  pixel  values  using  the  observed 
image  under  the  assumptions  of  the  model. 

Despite  the  optimistic  tone  of  the  previous  paragraph  there  are  over¬ 
whelming  computional  difficulties  in  implementing  the  smoothing  algorithm 

on  the  columns  of  an  image  matrix.  Even  for  a  binary  scene  on  a  128  x  128 

128 

lattice,  at  each  step  of  the  algorithm  there  are  on  the  order  of  2  calcula- 

1 28 

tions  necessary,  because  the  column  vector  has  2  possible  values. 
Obviously,  drastic  simplifications  are  necessary. 

Inspired  by  the  ideas  of  Woods  and  Radewan  [6]  on  Kalman  strip  processor 
we  propose  to  process  the  image  in  relatively  narrow  strips,  i.e.,  groups 

i  » 

of  rows.  Implementing  the  algorithm  on  a  strip  will  yield  f(xk|y  )»  the  a 

V 

posteriori  distribution  of  X^,  the  kth  column  restricted  to  the  strip,  given 

»  i  « 

Y  *  y  ,  Y  being  the  restriction  of  the  image  to  the  strip.  Clearly, 

* .  *  i . 

f(x^|y  )  +  f(x,|y).  However,  due  to  the  fact  that  the  dependence  between 
pixels  usually  decreases  as  the  distance  between  pixels  increases,  we  will 
assume  that  the  a  posteriori  distribution  of  the  pixels  along  the  center 


section  of  the  strip  will  not  be  influenced  very  much  by  the  observed  image 

It  I 

values  outside  the  strip.  Denoting  by  X^  the  restriction  of  X^  to  the  centc 
section  of  the  strip,  it  is  thus  assumed  that 


f(x”|y)  i  f(x^y’) 


(26) 
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This  is  a  reasonable  assumption  whenever  pixels  that  make  up  are 
"far  away"  from  those  outside  the  strip,  as  measured  by  the  effective 
"correlation  length"  of  the  field. 

The  a  posteriori  distribution  f(x^jy’)  is  simply  obtained  by  applying 
the  algorithm  on  a  strip.  Then  integrating  f(x^jy’)  over  the  entries  of 
that  are  outside  the  center  section  of  the  strip  gives  f(x£|y'),  which 
in  turn  is  approximately  equal  to  f(x£|y).  The  a  posteriori  distribution 
of  the  individual  pixel  values  that  are  entries  of  X£  are  simply  obtained 
by  integrating  f(x£|y')  over  all  the  other  entries  of  X£.  The  estimates 
of  the  pixel  values  that  are  entries  of  are  then  determined  from  the 

corresponding  a  posteriori  distribution  to  be  the  mode  or  the  mean  or  some 
other  parameter  of  the  a  posteriori  distribution  depending  on  the  specified 
cost  function  or  the  estimation  strategy. 

The  strips  are  arranged  such  that  the  union  of  the  center  sections 
is  equal  to  the  full  scene.  Thus,  the  estimates  along  center  sections 
combined  provide  an  estimate  of  the  full  scene.  The  processing  along  the 
strips  can  be  done  simultaneously. 

A  drawback  to  this  approach  is  the  assumption  that  the  column  vectors 
{X£}  along  the  strip  constitute  a  vector  Markov  chain,  which  is  not  neces¬ 
sarily  true  for  an  arbitrary  MRF.  Moreover,  even  if  {X^}  is  a  vector  Markov 
vector  chain,  the  transition  distribution  P(x£|x£_^)  of  this  vector  Markov 
chain  must  be  known  in  order  to  be  able  to  use  the  algorithm. 

To  address  both  of  these  issues  we  concentrate  on  the  Markov  mesh 
random  field  models  discussed  in  Section  III.  Looking  at  Theorem  3.3  (ii), 
we  see  that  the  column  vectors  along  a  strip  are  not  exactly  a  vector  Markov 
chain.  But  if  we  overlook  the  dependence  of  on  pixels  other  than  the 
adjacent  one,  in  other  words,  if  we  assume  that 
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P(xijlXi,j-l*Xi,j-2*‘"*xi,l)  =  P<XijlXi,j-l)  (27) 

then  it  follows  from  (22)  that  the  columns  along  the  strip  do  constitute 
a  vector  Markov  chain  and  (22)  gives  a  simple  explicit  expression  to  com¬ 
pute  P(x^|x£  p  in  terms  of  P(x^j  |x^,  beC^  p's,  the  causal  transition  dis¬ 
tribution  characterizing  the  MMRF,  and  the  transition  distribution 

p(xlllxi,l-l)- 

Aside  from  the  approximation  expressed  in  (27)  which  is  not  an  unreason¬ 
able  one  in  a  Markov  setting,  starting  from  a  joint  distribution  on  a  2  x  2 
block,  the  building  blocks  of  the  MMRF,  PCx^x^,  beC^  p  for 
C±  j  *  { (i,.1-l)  ,  (i-1,  j-1)  ,  (i— 1,  j)  >  and  PU^^  ,._p  needed  in  (22)  are 
computed  and  hence  P(x/|x/  p  is  computed  using  (22).  Thus,  with  a  MMRF 
model  and  with  the  approximation  of  (27)  the  Bayes  smoothing  algorithm  is 
readily  implemented  over  strips  of  the  image.  Let  us  denote  this  version  of 
the  algorithm,  in  which  strips  can  be  processed  simultaneously,  by  Algorithm  A. 

It  should  be  noted  that  for  Pickard  random  field  (PRF)  described  in 
Section  III,  the  desired  properties  of  the  strip  processor,  namely 

(i)  columns  along  a  strip  are  a  vector  Markov  chain, 

(ii)  p  explicitly  determined 

are  both  valid  without  any  approximation.  But  the  PRF  model  is  so  restric¬ 
tive  that  the  segmentation  so  obtained  for  our  scenes  is  not  as  good  as 
that  obtained  by  various  MMRF  models,  making  the  approximation  in  (27). 

The  restrictive  nature  of  the  PRF  is  illustrated  by  the  following  example. 

Consider  a  binary  scene  with  P(X  =  1)  ■  0.5  and  P(X  =  II X.  .  .  *  1)  =  0.8. 

ij  ij  1 

It  follows  from  the  model  constraints  that  P(X^  =  ljX^  =  1,  X^_^  -  I, 

X^_^  j  ■  1)  _>  0.9375  and  that  it  has  to  be  less  than  0.94  for  spatial  con¬ 
tinuity  constraints  such  as  P(X.  =  l|x  ,  ,  ■  0,  X,  ,  ,  ,  =  1,  X  ,  .  =  0) 
being  relatively  small  (^  0.05).  For  these  reasons,  the  PRF  model  will  not  be 
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discussed  any  further. 

Alternate  Version  of  Bayes  Strip  Processor  Algorithm 


For  a  MMRF  with  C  =  {(i,j-l),  (i-l,j-l),  (i-l,j)}  the  following 
i»  3 


relationship  holds: 

P‘Xa-V  V  bE 

'  P<Xa  ’  V  aepimj “  V  bEQi, jUSi,3> 


=  n 


aeE .  .  P<Kalxb'  bECa> 

i.3 


(28) 


The  sets  E  , ,  H ,  , ,  R,  , ,  Q .  .  and  S  .  are  shown  in  Figure  6.  In  simple 

ii] 

terms,  this  means  that  given  the  row  above  the  strip,  the  columns  of  the 
strip  form  a  vector  Markov  chain,  and  that  P(x£+1|xk,  *  where  x  denotes 

the  row  above  the  strip,  is  readily  obtained  in  terms  of  the  building 
blocks  (causal  transition  distribution)  P(xa|x^,  beCa)  of  the  MMRF. 

It  is  also  true  that  the  Bayes  smoothing  algorithm  (4)  -  (7)  described 
in  Section  II  is  still  valid  for  a  MMRF  model  with  vector  processes  (a  strip) 
and  with  x,  the  row  above  the  strip,  given,  meaning  that  x  is  simply  added  to 
all  the  conditional  distributions  in  (4)  -  (7).  The  transition  distribution 
P(x’+i|x£,x)  is  computed  using  (28).  So  the  algorithm  in  this  case  yields 
f(x£|y',  x )  recursively. 

In  order  to  apply  this  version  of  the  algorithm,  however,  we  need  to 
know  x  the  row  above  the  strip.  Clearly  this  information  is  not  available. 
But  an  estimate  x  of  x  is  determined  during  the  processing  of  a  previous 
strip.  So  if  the  estimate  x  is  used  instead  of  x>  f(x^|y',x)  is  obtained 
as  an  approximation  to  the  a  posteriori  distribution  f(x^|y',x)*  Then 
integrating  f(x^|y',jc)  over  the  variables  that  are  not  to  be  estimated, 


the  a  posteriori  distribution  of  the  variables  to  be  estimated  is  obtained. 


2\l 


Using  the  estimate  of  the  row  above  the  strip  is  equivalent  to 
adjoining  a  boundary  condition  to  the  segmentation  problem.  The  variables 
to  be  estimated  along  the  strip  could  again  be  along  the  center  section  of 
the  strip  or  could  be  along  the  top  of  the  strip  adjacent  to  the  "row  above", 
whose  estimate  is  used.  Let  us  refer  to  these  two  new  versions  of  the 
algorithm  as  Algorithm  B  and  Algorithm  B'  respectively.  Note  that  Algorithms 
B  and  B'  have  a  sequential  nature  because  of  the  use  of  the  previous  estimate  x. 

In  Section  V,  we  will  report  on  the  application  of  Algorithms  A,  B  and  B' 
to  obtain  binary  segmentation  of  some  noisy  test  images  and  some  remotely 
sensed  SAR  images  obtained  by  SEASAT. 

At  this  point,  it  would  be  in  order  to  summarize  all  the  assumptions 
and  approximations  made  to  obtain  an  implementable  2-D  Bayes  smoothing  algorithm. 
Aside  from  the  general  assumptions  of  the  scene  being  a  MRF  on  a  finite 
lattice  corrupted  by  additive  independent  noise  the  following  assumptions  and 
approximations  are  made: 

1  2 

i.  Scene  is  a  MRF  with  U  or  IF  neighborhood  system. 

ii.  Scene  is  a  MMRF  with  "support  set"  C  .  =  {(i,j-l),  (i-l,j-l),  (i-l,j)} 

i  >  J 

2 

(a  special  case  of  a  II  neighborhood  MRF) . 

iii.  Pixels  far  apart  tend  to  be  independent  and  as  a  consequence  of  this 

f(*J|y)  -  f(x£|y'). 

iv.  P(x. ,|x.  .  ,  ,x  .  9,...,x.  ,)  1  P(x. . |x,  .  (assumed  in  Algorithm  A). 

A 

v.  x,  estimate  of  the  row  above  the  strip  is  used  instead  of  x>  the  row 
above  itself  in  all  the  conditional  distributions  (assumed  in  Algorithms 
B  and  B ’ ) . 
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V.  BINARY  SEGMENTATION  USING  BAYES  SMOOTHING  STRIP  PROCESSOR  ALGORITHMS 

The  scene  X  Is  assumed  to  be  a  binary  (equivalently  two-level:  £„) 

MKRF  with  support  set  C  -  {(i,j-l),  (i-l,j-l),  (i-l,j)}.  The  joint  dis- 

1  *  J 

tribution  of  a  2  x  2  block  from  the  scene  is  specified  as  follows: 


P(  1  }  >  *  V  p<  \  l  >  -  V  P(  1  0  }  =  q3 

(29) 

P(  o  x  ^  “  q4*  P*  0  0>«q5>  p(  o  0>=q6 

The  joint  distribution  of  the  2x2  block  is  invariant  under  multiples  of  90c 
rotations.  Various  transition  probabilities  necessary  for  the  algorithms 
are  determined  in  terms  of  the  parameters  q^,  q2»-*»  q^>  Different  degrees 
of  spatial  continuity  can  be  modelled  by  a  proper  specification  of  q^  par¬ 
ameters  . 


The  scene  is  corrupted  by  additive,  independent,  zero-mean  Gaussian  noise. 

In  other  words,  the  r.v. 's  constituting  the  noise  field  are  assumed  to  be 
2 

lid  and  N(0,c  ).  For  the  sake  of  the  Computer  Vision  system  used  it  is 

further  assumed  that  the  noise  takes  discrete  values  with  probabilities 

2 

proportional  to  that  of  N(0,o  ),  upon  proper  normalization.  The  segmentation 
algorithm  is  applied  to  images  with  varying  noise  levels.  The  signal  to 
noise  ratio  (SNR)  of  an  image  is  defined  as 

*2  "  £1 
run  _  r. 


where  and  are  the  levels  of  the  binary  scene.  For  the  test  images 

segmented  the  quantities  anc^  0  are  aH  prespecified  and  hence  known. 

For  the  real  SAR  images,  they  are  estimated  by  assuming  that  the  image  is  a 

2 

mixture  of  two  Gaussian  distributions  with  means  and  ^  an<*  variance  o  . 


In  all  the  segmentation  examples  reported  here,  the  strips  are  taken 
as  3-row  wide  with  the  "center  section"  of  the  strip  being  the  middle  row. 
Algorithm  can  be  implemented  with  greater  strip  widths  but  the  computation 
times  become  prohibitive  for  strip  width  greater  than  5.  However,  as  will 
be  illustrated,  remarkably  good  segregation  results  are  attained  even  with 
3-row  wide  strips.  Both  the  test  images  and  the  SAR  images  segmented  are 
taken  to  be  64  x  64.  Larger  Images  can  be  processed  with  computation  times 
proportional  to  the  total  number  of  pixels  in  the  image.  Segmentation  of 
64  x  64  images  with  3-row  wide  strips  takes  in  the  order  of  30  seconds  on 
a  Computer  Vision  System  supported  by  a  VAX  11/780  computer. 

Taking  3-row  wide  strips  in  Algorithm  A  the  scene  values  are  estimated 
along  the  center  row  using  the  (noisy)  image  values  on  the  3  row  strip.  In 
Algorithm  B,  similarly  center  row  scene  values  are  estimated  using  the  3  row 
strip  and  the  estimate  of  the  "row  above".  In  Algorithm  B',  again  using 
the  image  on  the  3-row  strip  and  the  estimate  of  the  "row  above",  scene 
values  along  the  top  row  of  the  strip  are  estimated.  In  all  cases,  the 
estimate  is  taken  as  the  mode  of  the  corresponding  a  posteriori  distribution. 

Following  the  segmentation,  the  estimated  scene  is  post  processed  by  a 
3x3  median  filter.  This  post  processing  helps  eliminate  some  of  the  specks, 
i.e.  the  isolated,  presumed  to  be  erroneous  estimates.  But  while  eliminating 
erroneous  specks  some  new  errors  may  be  generated  due  to  a  resulting  decrease 
in  scene  detail.  A  desired  degree  of  scene  detail  together  with  possibly 
erroneous  speckles  can  be  attained  by  a  proper  choice  of  the  q.^  parameters. 
Different  segmentation  results  based  on  different  sets  of  parameters  are 
illustrated  in  the  figures. 
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In  Figures  7-22,  segmentation  of  some  test  images  with  different 
SNR's  using  Algorithms  A,  B  and  B'  are  presented.  The  arrangement 
of  the  four  sections  of  these  images  is  described  in  Table  I.  The  SNR 
of  the  noisy  image,  the  type  of  the  algorithm  used  and  the  parameters 
used  are  all  specified  separately  for  each  Figure.  The  three  versions 
of  the  algorithm  yield  similar  results  for  high  to  medium  SNR's  but  show 
distinctively  different  characteristics  for  low  to  very  low  SNR's.  It 
is  seen  from  these  Figures  that  segmentation  results  are  excellent  for 
high  to  medium  SNR's  (e.g.,  SNR  *  7  or  1)  and  resonably  good  for  low-to- 
very  low  SNR's  (e.g.  SNR  -  0.7  or  0.5).  Some  directionality  features  are 
apparent  in  the  segmentation  for  low  SNR's.  It  is  believed  that  this 
undesirable  effect  can  be  eliminated  or  reduced  by  increasing  the  strip 
width  and  by  properly  choosing  the  parameters. 

During  experimentation  with  the  segmentation  algorithm,  we  observed 
that  the  results  are  somewhat  sensitive  to  the  parameters  q^'s.  The  set 
of  parameters  which  yields  a  "good"  segmentation  depends  on  the  noise 
level  and  the  level  of  detail  (object  sizes)  in  the  image.  In  this  study, 

TABLE  I 

DESCRIPTION  FOR  FIGURES  7-22 

(a)  Noiseless  test  image. 

(b)  Gaussian  noise  added  to  (a)  with  specified  SNR. 

(c)  Segmentation  of  (b)  by  the  specified  algorithm. 

(d)  (c)  Post-filtered  by  3  x  3  median  filter. 


Figure  8.  Segmentation  with  Algorithm  B,  SNR*2,  q^O.42,  q2«q5*0.005, 
q^-O.Ol,  q^"e,  and  qg-0.5. 


a 


b 
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c 


d 


Figure  11.  Segmentation  with  Algorithm  B,  SNR=1,  q^=0.42,  q^^q^-O.OOS, 
q3»0.01,  q^-e,  and  qfi=0.50. 


a 


b 


Figure  13.  Segmentation  with  Algorithm  A,  SNR=0.7,  q 
q 2=0. 01,  q^-e.  and  q6=0.62. 


Figure  14.  Segmentation  with  Algorithm  B’,  SNR=0.7, 
q  =0.025,  q,=e,  and  q,=0.44. 


Figure  17.  Segmentation  with  Algorithm  A,  SNR=2 
q^*e,  and  q6“0.44. 


a 


b 


c 
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Figure  18.  Segmentation  with  Algorithm  B,  SNR-2 
q^-e,  and  q6»0.44. 


Figure  19.  Segmentation  with  Algorithm  B  ,  SNR=2 
and  q^=e. 


Figure  20.  Segmentation  with  Algorithm  A,  SNR=1, 
q»*0.005,  q, mG,  and  q,=»0.48. 


Figure  21.  Segmentation  with  Algorithm  B,  SNR=1, 
.015,  q^*=e,  and  qg=0.42 


Figure  22.  Segmentation  with  Algorithm  B',  SNR=1 
q^*e,  and  qg“0.40. 
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we  did  not  try  to  develop  a  scheme  to  estimate  the  parameters  that  will 
give  the  "best"  segmentor.  Instead,  based  on  a  few  trials,  we  chose  sets 
of  parameter  values  that  gave  reasonably  good  segmentation  results. 

Increasing  and  q^  and  equivalently  decreasing  the  others  has  a  stiffening 
effect  which  reduces  the  details  and  the  specks  presumed  to  be  noise. 

The  segmentation  algorithm  is  also  applied  to  some  remotely  sensed 
data,  namely  to  some  synthetic  aperture  radar  (SAR)  data  obtained  by  SEASAT. 
Such  an  SAR  image  provided  by  Office  of  Naval  Research  is  shown  in  Figure 
23.  Segmentation  Algorithm  A  is  applied  on  three  64  x  64  portions  of  this 
image,  shown  in  frames  in  Figure  23.  The  segmentation  results  of  these 
three  images  by  Algorithm  A  is  presented  in  Figures  24-26.  The  sections 
of  these  Figures  are  arranged  such  that  in  section  (a)  is  the  actual  SAR 
image,  in  sections  (b) ,  (c),  and  (4)  are  segmentations  of  (a)  using  q^^ 
parameters  getting  stiffer  in  that  order.  We  see  that  as  the  q^  para¬ 
meters  are  made  stiffer  some  of  the  image  details  as  well  as  some  specks 
presumed  to  be  noise  are  eliminated.  The  q^  parameters  can  be  chosen 
according  to  the  level  of  detail  desired  in  the  segmented  image.  Again 
the  actual  image  is  assumed  to  be  a  mixture  of  two  Gaussian  distributions 
and  the  parameters  of  these  Gaussians  are  estimated  and  used  in  the  seg¬ 
mentation  algorithm.  The  q^  parameters  used  are  specified  following  each 
figure.  In  all  cases  q^  =  10  ^  is  taken  and  this  amount  is  properly  sub¬ 
tracted  from  other  q^’s  so  that 

ql  +  4q2  +  4<13  +  2q3  +  +  ^6  =  1 

is  insured.  It  is  clearly  seen  in  Figures  24-26  that  the  algorithm  yields 
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Figure  23.  SAR  image  of  Chesapeake  Bay  area. 


a  b 


c  d 

Figure  24.  Segmentation  of  SAR  image  by  Algorithm  A.  (a)  image,  (b) 

q2-q3«q5-°.05,  q4»e,  (c)  q1*q6=0.32,  q2=q3=q5*0.03,q4=e, 
(<0  q^qg-0.44,  q2=q3-q5-0.01,  q 4=e . 


Figure  25.  Segmentation  of  SAR  image  by  Algorithm  A.  (a)  image,  (b)  qjL=q6=0.20 
q2=q3=q5=0.05,  q4=e,  (c)  q]=q6=0.32,  q2=q3=qs=0.03,  q4=e, 

(d)  qi=q,=0.44,  q,=q_=q,=0 .01 ,  q,=e. 


Figure  26.  Segmentation  of  SAR  image  by  Algorithm  A.  (a)  image,  (b)q  =q,=0.20 
<J2“^3*CJ5*0.05,  q4=e,  (c)  q^=qg=0.32,  q2”q3*q 5*0.03,  q4*£ 

(d)  q1",qfe“0.44,  qo=q,=qc=0.01,  q4=e. 
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remarkably  good  segmentations  of  remotely  sensed  data  as  well.  The 
algorithm  can  be  incorporated  with  a  parameter  estimation  scheme  so  that 
a  completely  automated  implementation  of  the  algorithm  is  feasible. 

VI.  CONCLUDING  REMARKS 

In  this  report  we  presented  a  new  segmentation  algorithm  based  on 
recursive  Bayes  smoothing  of  images  modelled  by  Markov  random  fields 
corrupted  with  additive  noise.  Ideally,  the  algorithm  yields  the  a  posteriori 
distribution  of  the  scene  at  each  pixel,  based  on  the  whole  noisy  image. 
Computational  concerns,  however,  necessitate  certain  simplifying  assumptions 
on  the  model  and  some  approximations  during  the  implementation  of  the  algor¬ 
ithm.  In  particular,  the  scene  is  modelled  as  a  Markov  mesh  random  field, 
a  special  class  of  Markov  random  fields,  which  allows  for  a  causal  condi¬ 
tional  distribution  characterization.  Various  properties  of  the  Markov 
mesh  random  fields,  their  relationship  to  Markov  random  fields  and  to  other 
classes  of  Markov  random  fields  are  investigated. 

The  algorithm  is  implemented  using  strip  processing  approach,  where 
the  estimate  for  each  pixel  value  is  determined  based  on  all  the  data  over 
a  strip.  This,  near  optimal  version  of  the  algorithm  is  applied  to  some 
test  Images  of  various  noise  levels  and  to  some  remotely  sensed  SAR  data. 

The  algorithm  yields  excellent  segmentation  results  for  high  to  medium  SNR's 
and  reasonably  good  results  for  low  to  very-low  SRN's  (up  to  0.5). 

Some  areas  for  future  work  include  extending  the  algorithm  to  segment 
multilevel  images  and  to  employ  wider  strips.  As  the  strip  width  is  increased 
the  algorithm  tends  to  the  optimal  Bayes  estimate  of  each  pixel  based  on  the 
whole  image.  It  is  also  of  interest,  to  explore  ways  of  determining  the 
vector  Markov  chain  tr  nsition  probability  distribution  according  to  some 


criterion.  The  transition  distribution  is  to  be  determined  using  the 
noisy  image  and  it  should  closely  represent  the  scenes  of  interest. 
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